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Abstract The Multiple Try Metropolis (MTM) method is a generalization of 
the classical Metropolis-Hastings algorithm in which the next state of the 
chain is chosen among a set of samples, according to normalized weights. 
In the literature, several extensions have been proposed. In this work, we 
show and remark upon the flexibility of the design of MTM- type methods, 
fulfilling the detailed balance condition. We discuss different possibilities and 
show numerical results. 
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1 Introduction 

Monte Carlo methods are powerful tools for scientific and approximate 
computing, numerical inference and optimization 6, 22J. For instance, Monte 
Carlo methods arc often necessary for the implementation of optimal Bayesian 
estimators so that several families of techniques have been proposed [7J [TU] . 
The core of the Monte Carlo approach consists of drawing random samples 
from a target probability density function (pdf). 

A very powerful class of Monte Carlo techniques is the so-called Markov 
Chain Monte Carlo (MCMC) algorithms EH EE1 HI 122] • They generate a 
Markov chain such that its stationary distribution coincides with the target 
probability density function (pdf). Typically, the only requirement is to be 
able to evaluate the target function, where the knowledge of the normalizing 
constant is usually not needed. 
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The most popular MCMC method is undoubtedly the Metropolis-Hasting 
(MH) algorithm [ill El- It can be applied to almost any arbitrary target 
distribution. However, to speed up the convergence and reduce the "burn- 
in" period, several extensions have been proposed in literature. For instance, 
the Multiple Try Metropolis (MTM) scheme [H] where, according to certain 
weights, the next state of the Markov chain is selected from a set of 
independent samples drawn from a generic proposal density. The main 
advantage of MTM is that it can explore a larger portion of the sample space 
without a decrease of the acceptance rate. Previously, a similar methodology 
was proposed in the domain of molecular simulation, called "orientational 
bias Monte Carlo" Chapter 13], where i.i.d. candidates are drawn from a 
symmetric proposal pdf and one of these is chosen according to normalized 
weights directly proportional to the target pdf. 

Due to its good performance and the attractive possibility to combine it 
with adaptive MCMC strategies (for instance using different interacting or 
adaptive proposals at the same iteration [3]), the basic formulation of the 
MTM has been modified and stressed in different ways. In [IS], the transition 
rule of the MTM algorithm is generalized such that the analytic form of the 
weights is not specified. They also study the extension of the MTM in the 
reversible jump framework. In [4 an MTM scheme with different proposal 
is introduced. Different approaches with correlated candidates have been 
suggested in [51 [UJ [2T] . Some interesting theoretical results on the asymptotic 
behavior of different MTM strategies and some considerations on the choice 
of the weights are given in [2J. 

In all the proposed MTM schemes the number of generated candidates is 
fixed, differently from the delayed rejection Metropolis algorithm [151 [27], and 
the state space is not augmented defining an extended target distribution, as 
in other MCMC methods based on auxiliary random variables [25] . 

In this work, we stress and remark upon the flexibility in the choice of 
transition rules within MTM algorithms. First of all, we mix the approaches 
from [4] and [19] , building a MTM with generic weights using different proposal 
pdfs. Then, we present a general framework for the construction of acceptance 
probabilities in MTM schemes. We show this theoretically and illustrate 
with specific examples. Owing to this flexibility, it is also possible to design 
a MTM scheme without drawing reference points [23] ■ Moreover, we also 
introduce this kind of MTM algorithm with a dcterminist reference points, 
and then discuss how this change affects its performance. We show that all the 
presented schemes fulfill the detailed balance condition and provide numerical 
comparisons. Related considerations can be found in [TJ [3J [TTJ [201 H51 [25] . 

The rest of the paper is organized as follows. In Section [2J we combine 
the schemes in [H [115] describing an MTM algorithm using different proposal 
densities and generic weight functions. In Section [3j we explain the flexibility 
in the choice of the acceptance functions, satisfying the detailed balance 
condition. Some examples of acceptance rules are shown in Section[4] Section[5] 
introduces a MTM method without generating the reference points randomly. 
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Numerical comparisons are given in Section [6] and finally we draw conclusions 
in Section [3 



2 MTM algorithm with generic weights and different proposals 

In the classical MH algorithm, a new possible state is drawn from the proposal 
pdf and the movement is accepted with a decision rule that guarantees 
fulfillment of the balance condition. In a multiple try approach, several 
(independent jT4j [T9] or correlated [El El]) samples are generated and from 
these a "good" one is chosen. 

In [4] the standard MTM is generalized using different proposal densities 
whereas in [TH] the authors extend the standard MTM considering generic 
weight functions. In the following section, we recall and mix together both 
approaches [U [T!5] providing an extended MTM algorithm drawing candidates 
from with different proposals where the weight functions are not defined 
specifically, i.e., the analytic form can be chosen arbitrarily (they must be 
bounded and positive functions). 



2.1 Algorithm 

Let p (x) be the pdf that we want to draw from and p{x) a function 
proportional to our target pdf p (x) (i.e., p{x) cx p {x)). Given a current 
state of the chain xt = x E T> C R, t 6 N, (we assume scalar values only for 
simplicity in the treatment), we draw N independent samples each step from 
different proposal pdfs, i.e., 

yi ~ ■K 1 (-\x),y 2 ~ 7r 2 (-|ac), ...,y N ~ n N (-\x). 

Therefore, we can write the joint distribution of the generated samples as 

QN{yi:N\x) = iri(yi\x)ir 2 (y 2 \x) ■ ■ -ir N (y N \x). 

Then, a "good" candidate among the generated samples is chosen according to 
weight functions w(z 1; z 2 ) £l 2 -4 M. + (where Z\ and z 2 are generic variables) 
that have to be (a) bounded and (b) positive. Given a current state x t = x, 
the algorithm can be described as follows: 

1. Draw ./V samples yi-.N = [yij J/2, Vn] from the joint pdf 

q(Vi-.N\x) = Tr 1 (y 1 \x)TT 2 (y 2 \x)TT 2 (y 3 \x) ■ ■ ■ TT N (y N \x), 

namely, draw yj from 7Tj(-|x), with j = 1, N. 

2. Calculate the weights ujj(yj, x), j = 1, N, and normalize them to obtain 
uij, j = 1, ...,N. 
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3. Draw a y = y k € {y\, ....,yjv} according to cjj, j = 1, ...,N and set (recall 
that y k = y) 

W -n - "k(y,x) n s 

W y -Uk-=jf , -• (1) 

4. Draw other auxiliary samples (often called reference points), 

for i = 1, k — 1, k + 1, ....,N, and set x k = x. 

5. Compute the corresponding weights ujj(x*,y), j = 1,...,N and set (recall 



that xX 



W„ = 



uk(x,y) 



6. Let Xt+i = y (recall that y = y k ) with probability 

p(y)w k (x\y) W x 



a(x, y) = min 



' p(x)n k (y\x) W y 



(2) 



(3) 



otherwise set x t+ i = x with the remaining probability 1 — a(x,y). 
7. Set t = t + l and go back to the step 1. 

The kernel of the algorithm above satisfies the detailed balance condition. The 
proof is a special case of the development that we will present in Section [3~2| 
using the probability a(x, y) in Eq. ([3]). 



2.2 Special case: standard MTM algorithm 

Choosing the weight functions with the specific analytic form 
Ui(yi,x) = p(yi)TTi(x\yi)Xi(x,yi), 



(4) 



with \i(x,yi) — Xi(yi,x), i = 1,...,N, we obtain the MTM scheme proposed 
in [4] (with different proposals) . Indeed, note that the acceptance function ^ 
can be also expressed as 



a(x, y) — min 



p{x)ir k {y\x) u k {y,x) J2f =1 Vj(x*,y) 



1, 



and using the weight choice in Eq. Q , 



a(x, y) = min 
then it is simplified 



p(y)ir k (x\y) p(x)ir k (y\x)\ k (x,y) Y J3 =i u >j(.yji x ) 
' p{x)ir k (y\x) p(y)Tr k (x\y)X k (y, x) J^f =1 uj,{x*,y) 



a(x, y) = min 



Ef=i^(a;*,y) 
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Finally, observe that if we use just one proposal, m(y\x) = 7r 2 (y|a;) = ... = 
t^n{v\x) and the same functions Xi(x,y) = \2(x,y) = ... = Ajv(x,2/), we 
obtain the standard formulation of the MTM P3]. Figure[l]represents a general 
scheme of the algorithm described in Section [O] 




Reference points 



x* ^ TTi(x*\y), i^k, x* k =x 



a(x, y) = min 



P{y)nk(x\y) W x 



p{x)ir k (y\x) W y 



Xt+1 



Fig. 1 Sketch of the MTM algorithm with generic weights and different proposals described 
in Section I2TT1 



2.3 Important observations 

It is important to remark that, in order to obtain a fair comparison among 
the generated candidates, in the computation of the weights, it is advisable 
to use proposal functions with the same area below, i.e., iri(yi\x)dyi = 
J T) TT2{y2\x)dy2 = ... = J^y tttv (yN \ x)dyN > for instance they can be normalized. 
This is not strictly needed but recommendable. 



Moreover, it is possible to show (see Section 3.2) that the algorithm above 



works owing to a(x,y) satisfies the following equation 

p(x)TT k (y\x)W y a(x, y) = p(y)n k (x\y)W x a(y, x). (5) 

Note that < W y < 1 and < W x < 1 are probabilities and functions of x, 
y, the remaining points yi and x*, then a more appropriate notation would 
be W y (y 1 , ...,y k = y, ...,y N ,x) and W x {x\, ...x* k = x, x* N , y)Q However, for 



1 Recall that yi are drawn from TTi(-\x) whereas x* are drawn from ni(-\y), i = 1, ...,N. 



6 



Luca Martino, Jesse Read 



simplicity we maintain the notation W y and W x . In the sequel, we suggest 
different acceptance functions a(x, y). 

3 Flexibility of the acceptance function 

Here, we introduce different multiple try MH approaches with generic weights 
functions. Specifically we show how to design different suitable acceptance 
functions a(x, y) fulfilling the detailed balance condition. Indeed, it is possible 
to choose functions a{x,y) with the form 

a(x,y) = P(x,y)j(x,y\x*_ k ,y_ k ), 

where 

1. (3(x, y) is such that 

p(x)ir k (y\x)j3(x,y) = p(y)-K k (x\y)P{y,x), Vfc G {1, ...,N}, (6) 

2. 7(x, y\x.*_ k ,y_ k ) satisfies 

W y -y(x,y\x*_ k ,y- k ) = W x j(y,x\y^ k ,xL k ), (7) 

where x* fc = [or*, ...x*^-^, x* k+1 , ...,x* N ] and y_ fc = [y 1 , ...y k _t,y k+1 , ...,y N ]- 

3. Finally we need 

0<a(x,»)<l. (8) 

If the Eqs. (|6| and ^ are jointly fulfilled then the condition ^ also holds, 
i.e., the equation 

p(x)ir k (y\x)W y a(x, y) = p(y)ir k (x\y)W x a(y, x) 

is satisfied. Equation Q can be easily obtained choosing separately < 
P{x,y) < 1 and < 7(35, y\~x*_ k , y_fe) < 1. Moreover, in this case, Eq. ^ 
is exactly the balance condition of the standard MH algorithm, then we can 
choose any acceptance functions suitable for the standard MH algorithm as 
function /3(x, y). Similar considerations can be used to design suitable functions 
7(2;, y\x*_ k ,y_ k ). Some examples are provided in Section |4] 

3.1 Algorithm 

The novel scheme can be summarized as follows: 

1. Draw ./V samples from the proposal pdfs yj ~ TTj(-\x), with j = 1, N. 

2. Calculate the weights uij(yj, x), j = 1, N, and normalize them to obtain 
Qj, j = 1, ...,N. 

3. Draw a y = y k £ {j/i, ....,j/at} according to Qj, j = 1, ...,N and set (recall 
that y k = y) 
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4. Draw other auxiliary samples x* ~ ni(-\y) for i = 1, fc — 1, fc + 1, iV, 
and set x\ = x. 

5. Compute the corresponding weights u)j(x*j,y), j — 1,...,N and set (recall 
that Xf. = x) 

w - u k( x >y) 

Ei=iWi(ay.tf) 

6. Let it+i = y (recall that y = y&) with probability 

a(x,y) = ^(x,y)7(x,j/|x* fc ,y_ fc ), 

where 

p(a ; ) 7r fc(yk)/3(2 ; ; 2/) = p(y)7Tfc(2 : l2/)/3(y, 

and 

W^7(a;,y|x* fe ,y_ fe ) = W^7(y,^|y_ fe ,x* fc ). 

Otherwise set Xt+\ = x with the remaining probability 1 — a(x,y). 

7. Set t = t + 1 and go back to the step 1. 

3.2 Balance condition 

To guarantee that a Markov chain generated by an MCMC method converges 
to the target distribution p (x) oc p(x), we can prove that the kernel A(y\x) 
of the corresponding algorithm (probability of accepting a generated sample 
y given the previous state value x) fulfills the following detailed balance 
condition^] [TJ 

p(x)A(y\x) =p(y)A(x\y). 

First of all, we need to write down the kernel A(y\x). We consider x ^ y, since 
the case x = y is trivial (indeed, in this case A(y\x) is proportional to a delta 
function 5(y — x) [131 [22)- The kernel (for x ^ y) can be expressed as 

N 

Mv = Vk\x) = ^2 h (y = yk\x,k = i), 

i=l 

where h(y = yk\x, k = i) is the probability of accepting the new state x t +\ = yu 
given the previous one Xt = x, when the chosen sample yk is the i-th candidate, 
i.e., when y^ = yi- However, since the yi are exchangeable, for symmetry we 
have h(y — yt\x,i) = h{y = yk\x 7 j) Vi,j € {1,...,N}. Hence, we can also 
write 

Mv = Vk\x) = N ■ h(y = yk\x,k), 

2 Note that the balance condition is a sufficient but not necessary condition. Namely, the 
detailed balance ensures invariance. The converse is not true. Markov chains that satisfy the 
detailed balance condition are called reversible. 
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where k £ {1, ...,N} and we recall N is the total number of proposed 
candidates y^. Then, we need to show that 

p(x)h(y\x,k) =p(y)h(x\y,k), 

for a generic k £ {1,...,N}. Following each step of the algorithm above, we 
can write 



p(x)h(y = y k \x,k) 
p(x) 



N 



Y,i=i u i(yi,x) 



N 



P(x, y)j(x, y\x*_ k ,y-k) dyx:k-idyk+i:Ndx\. k _ 1 dxl +l . N . 

S v ' 

a(x,y) 

Note that each factor inside the integral corresponds to a step of the method 
described in the previous section. The integral is over all auxiliary variables. 
Since we consider y = yk and recalling the definition of W y in Eq. (jlj , we can 
rewrite the expression in this way 

p(x)h(y\x,k) = 



p(x) 



v Jv 



nk(y\x) 



N 



w v 



N 



n «ii$\v) 



• P{x, yh(x, y\x*_ k ,y_ k ) dy 1 . k _idy k+ i :N dxl. k _ 1 dx% +1 . N 
and we only arrange it, obtaining 
p(x)h(y\x,k) = 



N 



N 



3 = l;j^k 



(9) 



n ^o\vov 

■p(x)TT k (y\x)f3{x,y) ■ W y j(x,y\x*_ k ,y-k) dy- k dx*_ k . 
Therefore, since we assume (see Eqs. @ and Q) 

p{x)^k{y\x)P{x, y) = p(y)ir k (x\y)p(y, x), 

and 

Wyj(x,y\x*_ k ,y-k) = W x ^(y,x\y-k,x*_ k ), 

it is straightforward that the expression in Eq. ^ is symmetric in x and y. 
Indeed, we can exchange the notations of x and y, and x* and yj, respectively 
and the expression does not vary. Then we can write 

p(x)h(y\x, k) = p(y)h(x\y, k). 



On the flexibility of the design of Multiple Try Metropolis schemes 9 

Since we have assumed a generic k and A(y = yu\x) = h(y = yk\x, k), it 
possible to assert that 

p(x)A(y\x) = p(y)A(x\y), 

that is the balance condition. Therefore, the Markov chain generated by the 
algorithm, described in the previous section, converges to our target pdf. 



4 Examples of functions a(x,y) 

In this section, we provide some suitable acceptance functions a(x, y) — 
VxD —> [0,1], that satisfies the condition The easiest way is to 
obtain a(x, y) is to design separately suitable functions < f3(x,y) < 1 and 
< 7(x,y|xl fe ,y_ fc ) < 1. 



4.1 Possible choices of f3(x,y) 

To design a function f3(x,y) such that < /3(x,y) < 1 and 

p{x)Kk{y\x)P{x, y) = p(y)TT k (x\y)f3(y, x), 

we can choose any acceptance rule suitable for the standard MH algorithm 
[TJ[TT]. Hence, for instance, we can choose the classical acceptance rule of the 
MH algorithm, i.e., 



/3i(x,y) = min 



1 p(y)nk(x\y) 
' p(x)irk(y\x) 



(10) 



Other possibilities are summarized in Table[T]where X(x, y) is a symmetric non- 
negative function (i.e., X(x, y) > and X(x, y) = X(y, x) for all (x, y) € T> xT>) 
such that < P(x,y) < 1. 



Table 1 Example of suitable functions 0(x, y) 



Functions (3(x,y) 


References 


3i (r v) - min 1 P(2') 7r *( x lf) 
Pi(x,y) - mm i, p(x) ^ k{y y x) 


rniin] 


a. ( T v \ - p(y)ir k (x\y) 


12 






EH 






Pb{x,y) - p{x) ^ k(y \ x) 


H3JE2] 


Pe{x,y) - 7Tkiylx) 


[T3l Chapter 5] 


8 7 (x,y) = ^\y)^y) 


[T3l Chapter 5] 



10 



Luca Martino, Jesse Read 



Moreover, defining 

= p(.y)Mx\y) 

p{x)ir k (y\xY 

and considering a function F(d) : M + — > [0, 1] such that 

then it is possible to define a general acceptance function [9l [10] 

/3 s (x,y) = (FoR)(x,y) = F(R(x,y)). 



For instance, if = min[l, #] we obtain Eq. (pL0|> and if = we 

find or /?3 with \(x,y) = 1 (see Table [T]). In |2U) there is a comparison of 
different acceptance functions in a standard MH algorithm. 



4.2 Possible choices of j(x, y\x*_ k) y~ k ) 

In this section, we provide some examples of suitable function j(x, y\x*_ k , y- k )- 
We need functions 7(2;, y\x*_ k , y_fc) such that 



W y j(x,y\x*_ k ,y- k ) = W x j(y,x\y- k ,x*_ k ), 



(11) 



where 



u k {y,x) 



, and 



Therefore, for instance, it is possible to choose 



7i(a;,y|x* fe ,y_ fe ) = W x 



Indeed, in this case 7(2/, x|y-fc, x* fe ) = W y and the condition (111 is satisfied 
(W y W x = W x W y ). Another possibility is to define 



7 2 (a;,2/|x* fc ,y_ fe ) 



W» + Wy ' 



or 



73(a;,y|x* fc ,y_fc) = min 



1. 



W v 
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5 MTM without drawing reference points 



The previous considerations also suggest how it is possible to design a MTM 
that avoids sampling the reference points xl fc . For some authors generating 
the reference samples is considered a drawback of the MTM schemes, since 
N — 1 samples are only drawn to fulfill the balance condition [23 . To avoid 
this step, the MTM method in Section pO] can be modified as follows: 



1. Given a current state Xt = x, draw N samples yi-N — [2/1,2/2, ■■■iVn] from 
the joint pdf 

q{yv.N\x) = 7Ti(?yi|a:)7r2(2/2|a;)7r2(2/3|x) • • • TT N (y N \x), 

namely, draw yj from itj(-\x), with j = 1, N. 

Calculate the weights u)j(yj, x), j — 1, N, and normalize them to obtain 



2. 

3. Draw a y = y k £ {y u 



Uj, j = 1, ...,N. 



IT", 



2/at} according to uij, j = 1, ...,7V and set 



(12) 



4. Set x 
5 



2/j for i = l,. 



, k — 1, k + 1, N, and set x* k = x. 



Compute the corresponding weights ujj(x*,y), j = 1, . 
Xk* — x) set 



w x 



y (recall that 2/ = 2/fc) with probability 



6. Let Xt+i 



a(x,y) 



1 P(y)UtiMx*\y) W x 



, N and (recalling 
(13) 

(14) 



otherwise set x t +\ — x with the remaining probability 1 — a(x,y). 
7. Set t — t + 1 and go back to the step 1. 

The differences w.r.t. the standard MTM method are contained in the steps 
4 and 6. In this case the vectors y = [y±, ...,2/fc = 2/, ■•■■,2/A r ] an d x* = [x* = 
2/1, — j x k = x i X N 



2/at] differ only in the position k, i.e., y_£ = x* 



Hence, note that a(x,y) can be expressed as 



a(x,y) 



(15) 



However, although this scheme satisfies the balance condition as we show 
below, observing the expression of a, a drawback could seem evident: since the 
samples 2/i : jv are drawn from iTi(-\x), i = 1,..,N, the product Ylijtk 7r i(2/il a; ) 
would be "often" greater then Yii^k n i(yi\y)- That is to say, x is more "likely" 
than y given the "observations" y-i, i 7^ k. Therefore, a(x, y) would be "often" 



12 



Luca Martino, Jesse Read 



less than 1 so that accepting a jump becomes "rare f] This issue would increase 
with N — > +00. However, the numerical simulations (see Section [6]) show 
that the probability a(x, y) first surprisingly increases for small values of N 



(owing to the factor and then decreases with N — > +00 as expected. The 

W y 

performance gets worse with N — > +00. Hence this scheme appears, in general, 
useless. These considerations above explain as, in the standard MTM version 
[14] , the authors introduce the idea of randomly generating the reference points 



x*. However, there is an important special case that we show in Section 5.2 



5.1 Balance condition 



Again we must check that the detailed balance condition p(x)A(y\x) — 
p(y)A(x\y) is fulfilled. The kernel A{y\x) (for x 7^ y) can be expressed, also in 
this case, as A(y — yk\x) — N ■ h(y — yk\x,k), where k £ {1, —,N} and N 
is the total number of proposed candidates yi. Then, finally we have to show 
that 

p(x)h(y\x,k) =p(y)h(x\y,k), 

for a generic k £ {1, N}. Following each step of the MTM algorithm without 
reference point, we can write 



p(x)h(y\x, k) = p(x) / • ■ ■ / 



N 

n Ti(j/i|a;) 
i=l 



dyi:k-idyk+l:Ndxl :k _ 1 dxl +1 . N . 



The integral is over all auxiliary variables. Just by rearranging, we obtain 

p(x)h(y\x,k) = ■ I 
Jt> Jd 



p{x)Y[^ l {y l \x)W y ,p(y)Y{-K l {x*\y)W x 



dyi:k~ldyk+i:Ndxl :k _ 1 dxl +1 . 



(16) 



Recalling that x* = yj for j = 1, .., k — 1, k + 1, ..,N, x* k = x and yk = y, the 
Eq. ( 16 ) can be rewritten as 



/ v 

N 



p(x)h(y\x,k) = ■ I 

N 

i\x)Wy,p{y)n k (x\y) Y\ ^t{yi\y)W x dy 1 . k _ 1 dy k+ i:N- 

i^k 

Therefore it is straightforward to see that we can exchange x and y without 
varying the expression above (see also Eq. ( 12 ) and ( 13 )), then p(x)h(y\x, k) — 
p(y)h(x\y, k) and the balance condition p(x)A(y\x) — p(y)A(x\y) is satisfied. 



3 However, it is important to remark that high acceptance rates are not a suitable indicator 
of good performance since, in general, the best acceptance rate is different from 1 |24j . 
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5.2 Independent proposal pdfs 



If the proposal pdfs are chosen as independent densities, i.e., ^i{yi \x) — 7Ti(j/i), 
^2 (j/2 1^) = ^{Vi)--- ' k n{vn\x) = ^n{un), the algorithm is simplified. Indeed, 
the a(x,y) probability in Eq. (|15[) , i.e., 



a(x,y) 



now it can be rewritten as 



l p(y)^k(x\y) U^kMy^lv) w x 

'p(x)n k (y\x) ^(y^Wy 



a(x,y) 



p(yW{x)Ui^k^(y^ w x 
, 'p(^)7Tfc(y)n^fc 7r »fe) w v 



p(y)ir k {x) Wx 
' p(x)TTk(y) W y 



Observe that it is exactly the probability a(x,y) obtained in Eq. ([3| using 
independent proposals. Therefore, here, the conclusion is different from the 
general case: it is not necessary to draw reference points when independent 
proposal densities are used. It is necessary just to set deterministically x* = yt 
for i = 1, k — 1, k + 1, ....,N, and set x* k — x. This special case, when the 



weights are chosen as in Section 2.2 is also discussed in [T31 Chapter 5]. 

Figure [2] depicts the scheme of a MTM with generic weights and different 
independent proposal pdfs, whereas Figure [3] shows virtually the simplest 
MTM algorithms, using the same independent proposal to draw the N 
candidates and importance weights (Fig. [3]ja)) or weights proportional to the 
target (Fig. [3]jb))j^]ln this special cases, the analysis of the algorithm is also 
simpler. Indeed, for instance, consider the case in Fig. [3](a). The acceptance 
probability can be expressed as 



t(x,y) = 



u>(x) 



J2^k uj (y l ) 



where w(yi) — . Note that, in this case clearly a(x,y) — ?► 1 as N — > oo, 
since the chosen candidate is "extremely good" using the importance sampling 
principle, when N — > oo. 



6 Numerical simulations 

In this section, we provide numerical results comparing different MTM 
approaches: using random walks or independent proposal pdfs, without 
drawing the reference points and using different acceptance functions. All the 
results have been averaged over 2000 runs and they are obtianed generating 
5000 iterations of the Markov chain. 

4 Another simple MTM scheme is the "orientational bias Monte Carlo" [8] Chapter 13]. 
In this case, the proposal pdl must be symmetric, i.e., Tr(y\x) = n(x\y), and the weights 
must be proportional to the target, i.e., ui(yi) = p{yi), i = 1, N. 
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^^n(vn,x) 





y = 


z Vk 




a(x, y) = min 


' p{x)Tr k (y) W y \ 








X = Xt+l 





Fig. 2 Scheme of MTM algorithm with generic weights and different independent proposal 
pdfs. 




vn yi 



"(2/1) = 



P(VN 



y = yk 




w(yN) =p(_m) 



a(x,y) = min 



1 ^fa/) + E^^fa) 



a{x, y) = min 



(a) 



(b) 



Fig. 3 Sketch of the simplest MTM schemes using just one independent proposal density, 
(a) with importance weights and (b) weights proportional to p(x). In these cases, clearly 
a(x, y) — > 1 as N — > oo. 



6.1 Random walk proposal densities 

Let X € K be a random variable^] with pdf 

p {x) cx p{x) = exp {-(x 2 - 4) 2 /4} 



(17) 



5 Note that, in this work, we have considered scalar variables only to simplify the 
treatment and the notation. All the considerations and algorithms contained in this work 
are also valid for multi-dimensional variables. 
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We want to draw samples from p (x) using different MTM schemes. We 
generate tries from a Gaussian proposal with variance a 2 and the mean 
depends on the previous state x of the chain, i.e., 

7 r(y|x)(xexp|- (2/ ^|. (18) 

We apply MTM methods using the proposal above, different number of 
candidates N = 1,2,5,100,1000 and different standard deviation a = 2,10. 
Importance weights oj(yi,x) = ^pra ar e used to select a good candidate. 
Observe that an MTM with N = 1 is exactly a standard MH algorithm. 
We also apply different MTM techniques without drawing the reference 
points (denoted as "MTM- without" ) described in Section [5] Tables [2] and [3] 
summarize the numerical results in terms of averaged probability of accepting 
a movement and linear correlation between the state Xt and Xt+\- 



Table 2 Numerical results (proposal as random walk, cr = 2, using importance weights). 



Technique 


Number of tries 


Acceptance rate 


Linear correlation 


standard MH 


TV = 1 


0.3002 


0.9053 


(MTM with TV = 1) 








MTM-rw 


TV = 2 


0.4363 


0.8397 


MTM-rw 


TV = 5 


0.6046 


0.6989 


MTM-rw 


TV = 100 


0.8647 


0.1892 


MTM-rw 


TV = 1000 


0.9557 


0.0513 


MTM-without 


TV = 2 


0.4229 


0.9160 


MTM-without 


TV = 5 


0.5121 


0.9568 


MTM-without 


TV = 100 


0.1902 


0.9978 


MTM-without 


TV = 1000 


0.0036 


0.9993 



Table 3 Numerical results (proposal as random walk, a = 10, using importance weights). 



Technique 


Number of tries 


Acceptance rate 


Linear correlation 


standard MH 
(MTM with TV = 1) 


TV = 1 


0.0991 


0.9085 


MTM-rw 


TV = 2 


0.1795 


0.8335 


MTM-rw 


TV = 5 


0.3483 


0.6700 


MTM-rw 


TV = 100 


0.8373 


0.1676 


MTM-rw 


TV = 1000 


0.9483 


0.0522 


MTM-without 


TV = 2 


0.1810 


0.8376 


MTM-without 


TV = 5 


0.3575 


0.7017 


MTM-without 


TV = 100 


0.4453 


0.9264 


MTM-without 


TV = 1000 


0.2612 


0.9952 



It is important to remark that high acceptance rates are not a suitable 
indicator of good performance since, in general, the best acceptance rate is 
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different from 1 |24j . Therefore, better performance is indicated by smaller 
correlations. We show also the acceptance rates because of the MTM method 
(drawing the reference points) presents a behavior typical in adaptive MCMC 
algorithms where the adaptive proposal pdf convergence to the true shape of 
the target [TS] : the acceptance rate grows and the linear correlation decreases 
quickly as N — > +00. Indeed, we can observe that, in both cases a = 2, 10, the 
correlation obtained with the MTM decreases to zero as N — > +00. Without 
drawing the reference points, the resulting algorithm is totally useless for a = 2 
(Table [2| whereas it outperforms the standard MH for N — 2 and N — 5 for 
a = 10 (Table [31. However, increasing N the performance gets worse. The 
results in Table [3j suggest that it exists an optimal number of tries for an 
MTM scheme without generating randomly the reference points. However, the 
MTM method with the additional cost of the random generation of reference 
points always outperforms the general scheme described in Section [5] With 
independent proposal pdfs this is not true as we show later. 



6.2 Different choice of the weights 



Considering the same target pdf in Eq. ( 17 ) , the Gaussian proposal with a = 10 
in Eq. ( 18 1 (random walk) and using N — 100 tries, we have compared the 
performance of different weight functions. Table [4] summarizes the results. 



Table 4 Numerical results (proposal as random walk, a = 10, N = 100 tries). 



Weights 


Acceptance rate 


Linear correlation 


Ui(Vi,x) = pl > Vi } 

tvyM ) -Ki{yi\x) 

importance weights 


0.8373 


0.1676 


Ui(Vi,x) = p(yi) 


0.8374 


0.1959 


Ui(yi,x) = 1 


0.0988 


0.9090 


Ui(Vi,x) = y/p(Vi) 


0.7036 


0.3340 


Wi(l/t,a:) = [p(j/i)] 2 


0.6870 


0.3093 


Ui(yi,x) = \p(yi)f 


0.4476 


0.4020 


Ui(yi,x) = TTi(x\yi) 


0.1348 


0.8809 


ujAyi.x) = — , 1 1 , 


0.0365 


0.9652 


Ui{y t ,x) = p(yi)ni(x\yi) 


0.8371 


0.2248 



The best results are provided by the importance weights u)i(yi,x) = 
J^ll\ x ) ■ The weights of the form u>i(yi,x) = p(yi) and u>i(yi,x) = p (yi)wi(x \yi) 
also yield small correlation. Clearly, the choice 0Ji(yi, x) = 1 produces the same 
results of a standard MH since the selected candidate is chosen uniformly 
among the set of drawn tries yi, i = 1, JV, without using any information of 
the target or the proposal functions. 
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6.3 independent proposal densities 

In order to draw samples from the target in Eq. ( fl7| ), we also apply MTM 
algorithms with independent proposal densities (MTM-ind) as 

Tr{y) cx exp i j- , 

with a = 10. In a first scheme, we generate N = 100 candidates from 
one proposal with fi — 0. Moreover, in other scheme, we use two different 
independent proposal pdfs with fii = —10 and \ii = 2. In this case, we draw 
N/2 = 50 tries from each one. We apply these schemes with importance 
weights, uJi(yi,x) = p ^ Vz ) , , and also with weights just proportional to the 
target pdf, oj i (y il x) = Table p] shows the numerical results. 



Table 5 Numerical results (cr = 10, N = 100 tries). 



Proposal pdfs 



Acceptance rate 



Linear correlation 



MTM-rw with 

m(vi,x) - — — - 



0.8373 



0.1676 



MTM-rw with 
Ui(yi,x) =p(yi) 



0.8374 



0.1959 



MTM-ind with 
one proposal pdf (fi = 0) and 



0.9760 



p(vi) 



0.0252 



MTM-ind with 
one proposal pdf (fi = 0) and 
Ui(Vi,x) = p(yi) 



0.9751 



0.0267 



MTM-ind with 
two proposal pdfs (^i = — 10 and fi2 = 2) 

and un{ yi ,x) = n ^ x) 

MTM-ind with 
two proposal pdfs (fii = — 10 and fi2 = 2) 
and (Ji(yi,x) =p{Vi) 



0.7420 



0.7509 



0.2748 



0.6622 



The first two lines of the Table [5] recall the acceptance rates and the linear 
correlations using the random walk proposal densities. The table shows that 
the MTM with independent proposal with n — provides the best results, i.e., 
the smallest correlation. However, the results depend strongly on a suitable 
tuning of the parameter /i. Also in this case, the importance weights seem 
to provide better results. Another important consideration is that, using two 
proposal pdfs, the MTM has selected a candidate generated from the proposal 
with /ii = —10 with a rate of 39.5% using importance weights, and just 1.5% 
with the weights proportional to the target. This observation can be extremely 
important to design an adaptive strategy where the best proposal density is 
chosen among of a set of proposals. 
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6.4 Heavy tails 

In order to analyze the performance of the MTM schemes with heavy tails, 
now we consider as target pdf the so-called Levy distribution for non-negative 
random variable, namely, 

Po (x) oc p(x) = {x ^ f/2 ex P (-^^)) - ^ > V > 0. (19) 

The normalizing constant — , such that p {x) — —p(x), is analytically known, 
c~ = Moreover, given a random variable X ~ Po(x), all the moments 

^[X 7 ] with 7 > 1 do not exist owing to the heavy tail characteristic of the 
Levy distribution. 

Our goal is to estimate the normalizing constant — via Monte Carlo 
simulation, when r\ = and v = 2, generating 5000 iterations of the 
Markov chain. We apply three different MTM techniques with N = 1000 
tries (drawing the reference points) and using importance weights to choose 
a suitable candidate each step. In the first two schemes (MTM-ind), we use 
an independent proposal ir{xt) oc exp{(x t — /i) 2 /(2cr 2 )} with /i = 10, 100 and 
er = 50, whereas, in the last one (MTM-rw), we use a random walk proposal 
7r(x t |x t _i) oc exp{(ai t — x t -\) 2 / (2a 2 )} with a — 50. We choose huge values of 
a due to the heavy tail feature of the target. We have averaged all the results 
over 2000 runs and they are summarized in Table [6j The real value of ^- when 

v = 2 is = 0.5642 



Table 6 Estimation of the constant 




= 0.5642 and standard deviation of the 



estimation (N = 1000 tries). 



Technique 


Estimation 
of i 

c p 


Std of 
the estimation 


Further informations 


MTM-ind 


0.6056 


0.0012 


fi — 10, cr — 50 


MTM-ind 


0.5994 


0.0010 


ft = 100, cr = 50 


MTM-rw 


0.5819 


0.0050 


cr = 50 



6.5 Different acceptance probabilities 

In this section, we consider again the target density p {x) oc p(x) = 
exp { — (x 2 — 4) 2 /4}, and we generate candidates from a random walk Gaussian 

density with a = 1, i.e., Tr(y\x) oc exp / — ^~ x ^ | . We choose as weight 

6 We do not provide the estimated linear correlation because of the moments (as the 
mean, for instance) of the target do not exist, and it makes difficult a right estimation of 
the correlation. 
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functions oj(x,y) = [p(x)] e , with 8 — 1/2. Note that they cannot be obtained 
using the analytic form necessary in the standard MTM [14] . Moreover, we 
consider four possible combinations of the j3(x,y) and j(x,y) functions 



Oil,i{x,y) = /3i(x,y)7i(a:,2/), 
ai,2{x,y) = Pi(x,y)j 2 {x,y), 
ai,3{x,y) = Pi(x,y)j 3 (x,y), 
a2,3(x, y) = fo{x, y)j 3 {x, y), 



4.1 



where each Pi(x, y), i = 1, 2, and jj{x, y), j — 1, 2, 3, are defined in Sections 
and[l2] Then, we run the different MTM algorithms with N = 10 and N = 100 
candidates. Table [7] shows the acceptance rate (the averaged probability of 
accepting a movement) and normalized linear correlation coefficient (between 
one state of the chain and the next) averaged over 2000 runs and obtained 
with the different techniques where N = 10. 



Table 7 Numerical results with N = 10. 



Function a 


Acceptance rate 


Linear correlation 




0.11 


0.99 


"1,2(2;, y) 


0.32 


0.98 


ott,3{x,y) 


0.55 


0.97 




0.33 


0.98 



Table 8] illustrates the results using N = 100. We observe that 0^3 
provides that greatest acceptance rate and lowest correlation in both cases. The 
acceptance rate of a\ t \ decreases with N = 100 because of 71 (x, y\x*_ k , y_fc) = 
W x diminishes with the number of tries TV. Moreover, the correlation appears 
(almost) invariant with the number of tries N . 



Table 8 Numerical results with N = 100. 



Function a 


Acceptance rate 


Linear correlation 




0.01 


0.99 




0.33 


0.98 


"1,30,?/) 


0.59 


0.97 


02,3 


0.35 


0.98 



Better performances can be attained using the acceptance function of [H] 
and rewritten in Eq. pj), as expected analyzing the analytic form of the 
different acceptance functions. Indeed, we obtain acceptance rates of 0.74, 
0.81 and correlation 0.96, 0.96 with N = 10 and N = 100, respectively. 
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7 Discussion 



In this work, we have studied the flexibility in the design of MTM techniques. 
We have introduced an MTM with generic weight functions (the analytic form 
can be chosen arbitrarily) and different proposal densities (each candidate 
can be drawn from a different pdf) combining the algorithms in [3] and 
[19) . Moreover, we have proposed a general framework for construction 
of acceptance probabilities in the MTM schemes, providing also specific 
examples. Finally, we have also designed a MTM algorithm without the need of 
generating randomly the reference points [23]. We have proved that the novel 
techniques satisfy the detailed balance condition, and carried out numerical 
simulations. Observing the theoretical workings and the numerical results, we 
can infer the following conclusions and observations: 

1. General considerations: The classical MTM method, proposed in [14] . 
clearly outperforms the standard MH algorithm using the same proposal 
pdf, in the sense that as the number of candidates increases, N — > oo, 



then the correlation decreases quickly to zero (see Section 6.3 for further 
considerations). If a designed MTM scheme does not fulfill this property, 
then it is totally useless since the computational cost increased but the 
performance is not improved. Suitable MTM methods can be applied 
efficiently to any kind of target distributions (bounded or unbounded, with 



heavy tails or not), as shown in our numerical simulations (see Section 6.4 1. 
Choice of the weights: the possibility to choose any bounded and positive 
weight functions makes the MTM scheme easier to be designed since the 
user should not check any conditions to use suitable weights (as to check 
symmetry of the function A, for instance) independently of the choice of the 
proposal pdfs. Namely, the proposal distribution and the weight functions 
can be selected separately, to fit well to the specific problem and to improve 
the performance of the technique. Note that, in some MTM approaches the 
symmetry condition of the function A can be complicated, see for instance 

PE3HI]. 

Further theoretical or numerical studies are needed to determine the best 
choice of weight functions given a certain proposal and target density. We 
find that the weights of the analytic form proposed in [T3] (see for instance 
Eq. Q) usually provide better results. Within this class, the importance 
weights u>i(yi) = ^(y'^x) > based on the importance sampling principle 
[T31 [22], appear to be a good choice in theory. Numerical results also suggest 
that weights simply proportional to the target density u>i(yi) = p(yi) 
can provide good performance. In [2] the authors note that importance 
weights place higher probability on selecting candidates that are further 
away from the current state of the chain, but finally they prefer to use 
weights proportional to the target density based on numerical results. 
If the evaluation of the target p(x) is computationally expensive such 
that the target function can not be included in the calculations of 
the weights, then the weight functions of the analytic class u>i(jji,x) = 
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p(yi)7Ti(x\yi)\(x, yi) proposed in [14] cannot be used. Indeed, it is 
impossible to find a symmetric function \{x,y) — X(y,x) in order to 
remove the dependence on p(x) in the weights (in this case there is just one 
possibility that p(x) is constant, i.e., p(x) = p{y) for all x,y € T>). In this 
case, a possible choice of the weights can be proportional to the proposal 
pdfs, namely w(yi) = Tr(x\yi) for instance. Clearly, it is not the optimal 
choice but, also in this case, the MTM can help to explore easily a larger 
portion of the sample space w.r.t. standard MH (see Section 6.2). 
Use of different proposal pdfs: a MTM scheme with different proposal 
densities can be a very powerful framework mainly to tackle applications 
with huge dimensionality and target distributions with several modes. In 
our opinion, the most promising scenario is to use different independent 
proposal distributions updating certain parameters (as mean and variance) 
each iteration of the chain, or selecting the best proposal among a set 
of functions (see Section 6.3 for further considerations). In this adaptive 
framework, the independent proposal pdfs could improved to fit better 
w.r.t. the target. This scheme has not been already exploited completely 
It is important to remark that, in order to obtain a fair comparison among 
the generated candidates, it is recommendable to use proposal functions 
with the same area below, for instance they can be normalized. 
Flexibility of the acceptance probabilities: we have shown there are certain 
freedom degrees in the design of an MTM algorithm, specifically in the 
choice of the acceptance probability a. This is also confirmed by other 
works in literature that design suitable MTM schemes with correlated 
candidates but they are quite different (the strategies in [IS] [ST] generate 
the candidates sequentially, whereas the approach in [5] uses a block 
philosophy). However, although the detailed balance condition is always 
satisfied in all cases, the performance is different. Numerical results suggest 
that a functions as close as possible to the standard MTM method |14j . 
using also the weights of the analytic form in Eq. |4]), perform better results. 
Similar considerations can be done about the standard MH algorithm 

[2 MM- 

Reference points: we have described a possible MTM algorithm without 
drawing reference points. As seen in the numerical results, in this case 
it seems to exist an optimal value of the number of candidates N. As 
N — > oo the performance becomes very poor. Therefore, we can figure out 
that the "secret" of the good performance of the standard MTM scheme 
in [8] [14] is contained in the random generation of the reference points. 
However, there exists an important special case where the reference points 
are completely unnecessary: using independent proposal densities. In this 
case, the reference points can be set deterministically, equal to the previous 
generated candidates. This scheme, using just one proposal (drawing N 
candidates from the same pdf) jointly with importance weights, appears 
as the easiest and natural procedure to combine the classical MH algorithm 
and importance sampling [22] (see Figure^ a)). 
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6. Number of candidates: All the schemes proposed in literature and also in 
this work use a fixed number of candidates N. An important improvement 
would consist on tuning adaptivcly the number TV depending on the 
discrepancy between target and proposal distributions. To do this, a 
certain measure is needed, for instance, as the effective sample size of the 
importance sampling framework [13l [22] . Clearly, this idea could be more 
effective using independent proposal pdf since it is necessary to measure 
the discrepancy between the proposal and the target functions (with a 
random walk, for instance, the mean of the proposal changes each step 
and the distance w.r.t. the target varies as well). Another possibility could 
be to combine MTM and the delayed rejection method [Hl|57j. With this 
kind of procedures, the optimal trade off between computational cost and 
performance would be achieved. 
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